Skip to content

feat: expose attention_type parameter in Llama.__init__#2143

Draft
jamesbiederbeck wants to merge 1 commit intoabetlen:mainfrom
jamesbiederbeck:expose-attention-type
Draft

feat: expose attention_type parameter in Llama.__init__#2143
jamesbiederbeck wants to merge 1 commit intoabetlen:mainfrom
jamesbiederbeck:expose-attention-type

Conversation

@jamesbiederbeck
Copy link

llama_context_params already contains an attention_type field and
llama_cpp.py defines the LLAMA_ATTENTION_TYPE_* constants, but
Llama.__init__ does not expose this parameter.

This makes it impossible to select non-causal attention from Python,
which is required for embedding models trained with bidirectional
attention (e.g. GTE/Qwen embedding models).

This PR wires the parameter through to self.context_params.attention_type,
mirroring how pooling_type is handled.

Example usage:

from llama_cpp.llama_cpp import LLAMA_ATTENTION_TYPE_NON_CAUSAL

model = Llama(
model_path="model.gguf",
embedding=True,
attention_type=LLAMA_ATTENTION_TYPE_NON_CAUSAL,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant